Hierarchical and reduced index structures for multimedia files

ABSTRACT

Playback and distribution systems and methods for multimedia files are provided. The multimedia files are encoded with indexes associated with the content data of the multimedia files. Through the use of the indexes, playback of the content is enhanced without significantly increasing the file size of the multimedia file.

CROSS-REFERENCE To RELATED APPLICATIONS

This application is a continuation application of U.S. patentapplication Ser. No. 12/272,631 filed Nov. 17, 2008, which applicationclaims priority to U.S. Provisional Application No. 60/988513, filedNov. 16, 2007, the disclosure of which is incorporated herein byreference.

BACKGROUND

The present invention relates generally to multimedia files and morespecifically to the indexing of information within a multimedia file.

In recent years, the playback of multimedia files has become anintegrated part of the average consumer's daily life. Cellulartelephones, DVD players, personal computers, and portable media playersare all examples of devices that are capable of playing a variety ofmultimedia files. While each device may be tailored to a particularmultimedia format, the extensive proliferation of these devicesencourages a certain level of interoperability amongst the differentdevice classes and categories. Likewise, there are certain features suchas fast-forward, reverse, start, stop, play, and pause which areexpected to behave similarly across all device categories, despite theirperformance capabilities and use-case application.

One of the most common features of media playback devices is the supportfor random access, fast-forward and reverse playback of a multimediafile, which is sometimes referred to as “trick play”. Performing trickplay functionality generally requires displaying the video presentationat a higher speed in forward and reverse direction, and resuming theoverall presentation from a position close to where the viewerterminated the video trick play activity. The audio, subtitle, and otherelements of the presentation are typically not used during trick playoperations, even though that can be subject to a device's operatingpreference. In accommodating trick play functionality, multimedia filestypically contain an index section used to determine the location of allframes, and specifically the video frames which can be independentlydecoded and presented to the viewer. When all index information isstored in a single location within a file and linearly references themultimedia information within the file, a player must seek to a specificindex entry in order to be able to play a file. For example, a playerthat is instructed to play a multimedia presentation at the half-waypoint of the presentation typically processes the first half of theindex data before being able to determine the set of data pointsrequired to commence playing.

The index section has many other potential applications as well: it maybe a necessary element in basic playback of multimedia files thatexhibit poor multiplexing characteristics; the index section may also beused to skip over non-essential information in the file; also, an indexis often required for the resumption of playback after the terminationof trick play functions.

SUMMARY

Embodiments of the invention utilize indexes that can increase theefficiency with which a player can perform a variety of functionsincluding trick play functions. In several embodiments, the index is ahierarchical index. In many embodiments, the index is a reduced indexand, in a number of embodiments, the index is expressed using bit fieldflags and associated data fields.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a graphical representation of an index structure within amultimedia file in accordance with an embodiment of the invention.

FIG. 2A is a graphical representation of an index structure followingthe audio/video data of a multimedia file in accordance with anembodiment of the invention.

FIG. 2B is a graphical representation of an index structure interleavedwithin the audio/video data of a multimedia file in accordance withanother embodiment of the invention.

FIG. 2C is a detailed graphical representation of an index structurerelative to other portions of a multimedia file in accordance with anembodiment of the invention.

FIG. 2D is a graphical representation of an index structure relative tocue data of a multimedia file in accordance with an embodiment of theinvention.

FIG. 3 is a graphical representation of index structure detailing bitflags and associated data filed within a multimedia file in accordancewith an embodiment of the invention.

FIG. 4 is a graphical representation of index structure providing timecodes and offset data fields within a multimedia file in accordance withan embodiment of the invention.

FIG. 5 is a graphical representation of index structure with time codesand multiple offset data fields relative to a size data field within amultimedia file in accordance with an embodiment of the invention.

FIG. 6 is a graphical representation of index structure with time codesand primary offset data fields within a multimedia file in accordancewith an embodiment of the invention.

FIG. 7 is a semi-schematic network diagram of playback system forstreaming and fixed media file playback in accordance with an embodimentof the invention.

FIG. 8 is a flowchart of a process utilizing index structure within amultimedia file in accordance with an embodiment of the invention.

FIG. 9-11 are graphical representations with increasing detail of anindex structure within a multimedia file in accordance with oneembodiment of the invention and to further illustrate the process ofFIG. 8.

DETAILED DESCRIPTION

Turning now to the drawings, multimedia files including indexes inaccordance with embodiments of the invention are described. In a numberof embodiments, the index is a hierarchical index. A hierarchical indexis a representation of index information in a form that provides acoarse index to a few predetermined locations within the multimedia filefollowed by a further refined representation of the portions of themultimedia file. In many embodiments, the lowest level of the index issufficiently granular as to identify every frame in the multimedia file.When a hierarchical index is used, a player need only request a smallamount of relevant index information in order to commence playing amultimedia file. As such, the hierarchical index lowers the memoryfootprint needed by playback devices to effectively seek and performtrick-play operations on a multimedia file. Additionally, file loadtimes for playback are reduced and trick-track load performanceenhanced. In one embodiment, the hierarchical index has indexinformation that includes offsets into cue points within a multimediafile with timestamps allows lookups to be fast and efficient.

In several embodiments, the multimedia file includes a reduced index.Players in accordance with embodiments of the invention can utilize areduced index to rapidly move between accesses or key-frames whenperforming trick play functions. The reduced index can be in conjunctionwith a hierarchical index. However, reduced indexes can be included inmultimedia files that do not include a hierarchical index. A reducedindex only provides the location of the accesses or key-frames within amultimedia file, along with a time-stamp value to indicate theircorresponding time within the multimedia presentation. In a number ofembodiments, bit field flags and associated data fields are used torepresent index information. Such a representation can be used inaccordance with embodiments of the invention to express indexinformation, a hierarchical index and/or a reduced index.

Hierarchical Indexes

A multimedia file containing a hierarchical index in accordance with anembodiment of the invention is shown in FIG. 1. The multimedia file 10includes header information 12, index information 14 interleaved amongstaudio/video data 16 and a three layer hierarchical index. The coarsestlayer 18 of the hierarchical index includes a small number of referencesto pieces of index information. The middle layer 20 and the finest layer22 each include successively larger numbers of references to indexinformation.

In many embodiments, the index information 14 interleaved amongst theaudio/video data 16 lists the location of encapsulated audio, video,subtitle, and/or other similar data. Typically, each block ofinterleaved index information lists the encapsulated media immediatelyfollowing the block of interleaved index information. In severalembodiments, the index information 14 contains information thatdescribes the absolute or relative location of the start of each pieceof encapsulated media. In a number of embodiments, the interleaved indexinformation 14 includes the size of each indexed piece of encapsulatedmedia, in addition to information indicating whether the indexed pieceof encapsulated media can be used as an access or key-frame, itspresentation time value, and other information, which may be helpful toa decoding device.

Each layer in the hierarchical index includes references to theinterleaved index information 14 within the multimedia file 10. Theimplementation of the hierarchy structure can be inclusive or exclusive,meaning that the data in each layer can be repeated in the other layersor each layer may contain unique position information. In addition, thenumber of elements at each layer of a hierarchy and the total number oflayers can be pre-determined, limited based on pre-determined values, orunbounded.

Although a specific implementation of a hierarchical index is shown inFIG. 1, hierarchical indexes can be implemented in many different ways.For example, the index values can be stored in a single part of thefile, or distributed in clusters in the file. Multimedia filescontaining different distributions of index information in accordancewith embodiments of the invention are shown in FIGS. 2A-B. For example,the index information could be appended or pre-pended to the audio/videodata portion 16 of the multimedia file 10 as an entire unit 21. Indexclusters 22 shown in FIG. 2B can also be woven into the audio/video dataportion. In addition to distributing index information in differentways, the hierarchy itself can be implemented as a structure that pointsto the actual frames in a file (as opposed to blocks of indexinformation), which may or may not start with access or key-framepositions.

FIG. 2C further details the hierarchical index 21 within a largerhypothetical file structure MKV file 200. This file structure is made oftwo primary sections, the EBML 24 and the Segment 26. In this filestructure, the Segment may host the Seek Head 201, Segment Info 202,Tracks 203, Chapters 204, Cluster 205, Cues 29, and a Hierarchical Index21. As shown, a plurality of hierarchical indexes 21 could be includedwith the multimedia file. Additionally, each hierarchical index caninclude multiple hierarchical index points 23. These index points invarious embodiments have a timestamp 25 and a track position 27,specifying a specific media track 27 a and a position or offset 27 bfrom the timestamp 25. Cues 29 are also shown and as will be explainedin more detail below are utilized by the index points 23 to increaseaccess to specific points within a multimedia file. This dynamicstructure for example is shown in FIG. 2D where multiple hierarchicalindex points 23 reference or point to multiple cue points 28. In variousembodiments, the hierarchical index contains references to a fraction,e.g., one tenth, of cue points relative to the total number of cuepoints in a media file. One would appreciate that the references canincrease to increase the granularity of pointers or references to thecue points.

A player attempting to decode a multimedia file that includes ahierarchical index in accordance with an embodiment of the inventiontypically uses the hierarchical index as necessitated by the functionsthe player is requested to perform. When trick play functions arerequested, the player can locate an index in the hierarchy correspondingto a specific speed and decode each of the frames indicated by theindex. The manner in which a specific frame is located using the indexdepends upon the nature of the index. In embodiments where each index inthe hierarchy points directly to video frames, then the process issimple. In embodiments where the index points to additional indexinformation within the multimedia file, the additional index informationis accessed and used to locate a desired frame.

Reduced Indexes

Many multimedia files in accordance with embodiments of the inventionuse reduced index information. Reduced indexes can be used inconjunction with a hierarchical index or in multimedia files that do notinclude a hierarchical index. A reduced index does not includeinformation concerning every piece of multimedia information within amultimedia file. A reduced index typically is restricted to informationconcerning the location of access or key-frames and the time stamp ofthe access or key-frames. Access frames are generally video frames thatcan be independently decoded, although the reduced index can be used topoint to any other type of key-frame for other streams stored in themultimedia file. The reduced index can enable a player to rapidly skipbetween key frames when performing trick play functions.

In a number of embodiments, a reduced index is only provided for asingle or primary data type and offsets are provided for each of theother streams of data contained within the file which may be related tothe primary data type. The offsets can be used by a player to facilitatesynchronized playback of different media. In several embodiments, eachpiece of index information also includes the size of the access orkey-frame and the data-type of the access or key-frame. A playerdecoding a multimedia file that contains a reduced index in accordancewith an embodiment of the invention can use the reduced index to performtrick play functions in a similar fashion to the way in which a playeruses a hierarchical index. The player can sequence through the reducedindex inspecting the Timestamps of access or key frames to ascertainwhich frames to render in order to achieve a desired speed.

Expressing Index Information Using Bit Fields

Multimedia files in accordance with a number of embodiments of theinvention utilize bit field flags and associated data fields to expressindex information. In many embodiments, the bit field flags are used tosignal the presence of a set of corresponding variable length datafields that contain index information. Bit field flags 31 and datafields 32 that can be used to express index information concerning apiece of multimedia information in accordance with an embodiment of theinvention are shown in FIG. 3. In the illustrated embodiment, a set ofbit-field flags signals the presence of additional data following theflags. The bit-field flags are specified as 8-bits in their entirety,but that is not necessarily a requirement for other implementations. Thefirst bit of the flag may indicate an Absolute/Fixed Size field 31 a,which determines whether the size of the frame is read from apre-determined set of sizes stored in a separate section of the file, orwhether they are available as a series of bytes following the flagsfield. Two additional bits, Fixed Size Index/Byte Numbers field 31 b,are used to determine the index-position of the size value or the totalnumber of bytes used to represent the value, depending on the setting ofthe Absolute/Fixed Size bit or field 31 a. The next bit, a PrimaryOffset field 31 c, determines the size of the offset value, which may bethe location of the frame. This bit is selected amongst twopre-determined byte numbers, for example either a 4-byte value or 8-bytevalue. Likewise, a flag may indicate the presence of anotherpredetermined offset, e.g., a Secondary Offset 33, which can be 4 bytesand represents a relative offset from the Primary Offset value. A bit 31e indicating the presence of a timecode byte sequence may also bepresent, along with another bit, Key Frame Flag bit 31 f, which can beused to determine the presence of access or key frames. In manyembodiments, bit field flags and data fields similar to those shown inFIG. 3 are used to index the location of all frames in a multimediafile.

The number of flags that can be represented via the structure shown inFIG. 3 is infinitely extensible using a “Flags Extension” bit 31 g whichsignals the presence of a follow-on flag. Here, one bit 31 h may bereferred to as “Associated Offsets”. Associated offsets may then signalthe presence of a byte value, which is used to determine the number ofstreams which correspond to the current frame. These relative offsetsmay use the same flag and subsequent index information for other framesin the stream, to be used for synchronization purposes. The framesidentified by the relative offsets, when played back correctly, mayprovide a synchronized presentation of audio, video, subtitles, andother related data. The stream number value 32 often corresponds to theactual stream numbers stored in the file.

Index information represented using the two relative offset values 41a,b is shown in FIG. 4. In many embodiments, the data type for eachframe is indicated for an entire group of frames, or alternatively isindicated on a frame-by-frame basis, in which case a “Data Type” field35 is added to the index-structure. The presence of a Timecode value 37to indicate the exact time of a frame in an overall presentation may bedone via a set of pre-determined specifications. For example, theTimecode value could be required for all video access frames;alternatively, the presence of a Timecode could be mandatory on aperiodic basis for audio samples. It is only important to note that theTimecode value is optionally present and is indicated by a correspondingbit-flag.

Through a set of pre-determined rules, structures similar to thosedescribed above can be applied for the representation of hierarchicalindexing in accordance with embodiments of the invention. For example,the “Primary Offset” value 50 can point to a specific index position,along with the Timecode value 52 indicating the exact time-stamp of theindex. An additional bit-field 39, the “Subindex”, can point to arelative offset from the position indicated by the “Primary Offset”.This “Subindex” position 54 is a refinement from the beginning of alarger index cluster. Use of various values to construct a hierarchicalindex in accordance with an embodiment of the invention is shown in FIG.5.

Bit field flags and associated data fields can also be used to representa reduced index structure pointing to a series of access or key framesfor a particular stream in a file. A reduced index in accordance with anembodiment of the invention is shown in FIG. 6. In the illustratedembodiment, the “flags” field 602 is followed by a corresponding set ofsize bytes 604, a “Primary Offset” value 606, and a Timecode 608. Theaccess frames may typically be related to video frames in a file, thoughagain this field could be defined for all stream types in a file. Thestructure 600 shown in FIG. 6 stores the location of all access orkey-frames, and can contain the location of all related offsets for theencapsulated tracks in the file.

It is important to note that the use of flexible bit field flags enablesthe implementation of multiple data structures which may appear in thehierarchical, reduced, and conventional indexing schemes. The use of bitfields as flags indicating variable length data can help optimize thesize of an overall index because not all members are in general requiredby all frames.

Referring now to FIG. 7, a progressive playback system in accordancewith an embodiment of the invention is shown. The playback system 190includes a media server 192 connected to a network 194. Media files arestored on the media server 194 and can be accessed by devices configuredwith a client application. In the illustrated embodiment, devices thataccess media files on the media server include a personal computer 196,a consumer electronics device such as a set top box 18 connected to aplayback device such as a television 200, and a portable device such asa personal digital assistant 202 or a mobile phone handset. The devicesand the media server 192 can communicate over a network 194 that isconnected to the Internet 204 via a gateway 206. In other embodiments,the media server 192 and the devices communicate over the Internet.

The devices are configured with client applications that can requestportions of media files from the media server 192 for playing. Theclient application can be implemented in software, in firmware, inhardware or in a combination of the above. In many embodiments, thedevice plays media from downloaded media files. In several embodiments,the device provides one or more outputs that enable another device toplay the media. When the media file includes an index, a deviceconfigured with a client application in accordance with an embodiment ofthe invention can use the index to determine the location of variousportions of the media. Therefore, the index can be used to provide auser with “trick play” functions. When a user provides a “trick play”instruction, the device uses the index to determine the portion orportions of the media file that are required in order to execute the“trick play” function and requests those portions from the server. In anumber of embodiments, the client application requests portions of themedia file using a transport protocol that allows for downloading ofspecific byte ranges within the media file. One such protocol is theHTTP 1.1 protocol published by The Internet Society or BitTorrentavailable from www.bittorrent.org. In other embodiments, other protocolsand/or mechanisms can be used to obtain specific portions of the mediafile from the media server.

Referring to FIGS. 8-11, one embodiment of a process of utilizing theindex structure is shown. A media file, e.g., MFile 120, is receivedfrom, for example, a media server based on a media file request from aplayback device or in particular a playback engine of the playbackdevice (111). Upon locating the requested media file, the media servertransmits all or some portions at a time of the media file to theplayback device. The playback device in one embodiment decodes thetransmitted media file to locate the hierarchical index (112). In onesuch embodiment, referring to FIG. 9, the playback device traverses orparses the file starting from EBML (Extensible Binary Meta Language)element 128, the Segment element 129 and then the contents of the SeekHead 121 to locate the Hierarchical Index 127. As such, the Segmentinformation 122, Tracks 123, Chapters 124, Clusters 125 and Cues 126,although could be also parsed, can be bypassed to quickly locate theHierarchical Index. The located Index is then loaded into memory (113).Loading the Index into memory facilitates access to locate a desiredpacket or frame to be displayed or accessed by the playback device.

The Hierarchical Index is small enough for many low memory playbackdevices, e.g., low level consumer electronic devices, to hold the entireIndex in memory and thus avoiding a complex caching scheme. In cases,where the Index is too large to store in memory or generally morefeasible, no loss in seek accuracy occurs. With the Index being a lookuptable or mechanism into the cues or defined seek points for each of thetracks and not the actual seek points, the dropping of portions of theIndex can cause a few additional reads when searching the cues for adesired seek point. The playback device accesses the bit stream packetsor frames of the transmitted media file to play the audio, video, and/orsubtitles of the media file (114).

Upon a user request, e.g., a trick-play request, the playback devicesearches the loaded or cached Hierarchical Index to find an entry orhierarchical point equal to or nearest and preceding to the desired timeor seek point (115). In one embodiment, the particular hierarchicalpoint is located based on the presentation time or timestamp of thecontent being played and the user request, e.g., the speed and/ordirection of trick-play function. In the illustrated case, FIGS. 10-11,the desired timestamp is 610 seconds within the bit stream.

FIG. 10 demonstrates a total of 6 hierarchical access points 130,starting from Index Time zero (Hierarchical Index Time 131) to IndexTime 600 (Hierarchical Index Time 132), where five of the HierarchicalPoints on this diagram have not been shown. After locating the closesthierarchical point to the desired seek time (in this case Index Time600), an Index Position or offset 134 is retrieved from Track Position133 to locate a portion of cues that contains the desired seek point(116). The playback device seeks to the located portions of cues (117)and the cues are read through until an entry equal to, or nearest andpreceding to the desired time or seek point is located (118).

Utilizing the located cue, the playback device retrieves an offset valueto seek and find the desired cluster (119). A block in the desiredcluster that has a corresponding timestamp as the desired timestamp,e.g., 610, is located and decoded (120) for display by the playbackdevice. The process continues until a user request stops playback of themedia file.

This concept is further clarified in FIG. 11. The Hierarchical Indextime of 600 is identified from the Hierarchical Index structure 127 aspreviously described in reference to FIG. 10. In this particularexample, the Index position within the Cues structure 151 is used toaccess the particular Cue Point 152 which corresponds to time 610 (CueTime 153). The Cue Point 152 through data in Track Position 154 andCluster Position 155 generally points to the Cluster structure 160 whichmay host several seconds' worth of multimedia data.

The multimedia data within a Cluster 160 may be stored as a Block Group163, where individual Blocks of data corresponding to one or more accessunits of the elementary audio, video, subtitle, or other multimediainformation exist. As such, Clusters contain block groups but can alsocontain only simple blocks. In the absence of a Block Group, it may bepossible that a Cluster can host individual Blocks or a Simple Block.The corresponding Cluster Position 155 from the Cue Point 152 is used tolocate the Cluster 160 and the desired Block 161 can be identified basedon its time stamp (Block Time 162). In case where an exact time stamp isnot matched, the Block with the closest time stamp can be identified.

The procedure for locating a Block according to a particular time may berepeated for multiple tracks of multimedia data such that all of thedata in the corresponding Blocks are presented in a synchronized manner.

While the above description contains many specific embodiments of theinvention, these should not be construed as limitations on the scope ofthe invention, but rather as an example of one embodiment thereof.Accordingly, the scope of the invention should be determined not by theembodiments illustrated, but by the appended claims and theirequivalents.

1. A method of playing back content by a playback device stored in amedia file, the method comprising: providing a media file to a playbackdevice, the media file having content data, cue data and a hierarchicalindex, the content data having a plurality of media frames, the cue databeing associated with each frame of the plurality of frames and thehierarchical index being associated with a subset of the cue data;decoding the content data by a playback device; displaying content on adisplay screen from the decoded content data; receiving a user request;locating the index based on the user request; and decoding one or moremedia frames from the subset of the cue data based on the located index.2. The method of claim 1 wherein the index has reference values thatpoint directly to particular media frames of the plurality of mediaframes.
 3. The method of claim 1 wherein the hierarchical index includespointers to the cue data within the media file and further comprisingaccessing the cue data through the pointers to locate a frame to bedecoded.
 4. The method of claim 1 wherein the hierarchical index isassociated with a subset of the plurality of media frames through asecond hierarchical index and further comprising locating a subset ofthe second hierarchical index based on the hierarchical index relativeto the user request.
 5. The method of claim 4 wherein decoding one ormore media frames from the subset of the plurality of media frames isbased on the located subset of the second hierarchical index.
 6. Themethod of claim 1 wherein the media file has an upper hierarchical indexassociated with the hierarchical index and the hierarchical indexassociated with one or more media frames from the subset of theplurality of media frames and wherein the upper hierarchical index has agreater number of associations with one or more media frames than thehierarchical index.
 7. A method of encoding a media file for playingback by a playback device, comprising: incorporating seek locations withcontent data, the content data having audio, video and subtitle tracks,the seek locations identifying particular timestamps for particularframes within the audio, video or subtitle tracks; incorporating anindex referencing the incorporated seek locations; and creating a mediafile having the content data, the incorporated seek locations, and theincorporated index.
 8. The method of claim 7 wherein incorporating theindex further comprises interleaving an index between audio and videotracks.
 9. The method of claim 8 wherein the index points only to keyframes within the audio, video or subtitle tracks.
 10. The method ofclaim 9 wherein the index contains locations and timestamps to keyframes.
 11. A system for playback of a media file, comprising: a mediaserver configured to transmit a media file having at least one indexhaving pointers that reference at least one frame of media contentwithin the media file; and a client processor in network communicationwith the media server and configured to send requests for the media fileto the media server, the media server configured to transmit the mediafile requested, the client processor comprises a playback engineconfigured to locate a pointer from the at least one index to locate anddecode the portions of the media file based on the located pointer tocomply with user playback instructions.
 12. The system of claim 11wherein the at least one index contains pointers to key frames only. 13.The system of claim 11 wherein the at least one index contains pointersto another index that contains pointers that reference at least oneframe of media content within the media file.
 14. The system of claim 11wherein the at least one index comprises a plurality of pointers, eachpointer referencing a frame of media content and the plurality ofpointers in its entirety referencing each frame of media content. 15.The system of claim 11 wherein the at least one index comprises a coarseindex having pointers referencing frames of media content and a fineindex having pointers referencing frame of the media content, the fineindex having more pointers than the coarse index.
 16. The system ofclaim 11 wherein the at least one index comprises a reduced index havingpointers only to access frames of the media content.